Knowledge Annotation and Visualization¶
Summarizing Knowledge using Network Analysis¶
Similarly to how the Clinical Knowledge Graph (CKG) can be used to annotate a list of proteins based on their connections, CKG can also annotate a list of drugs. CKG generates a comprehensive graph with all the connections to Diseases, Drugs, Protein targets, Protein Complexes, Pathways and Side effects.
All the connections extracted from CKG are then summarized into a smaller subgraph containing only the top nodes of each type (Disease, Durg, Complex, Pathway, Side effects, Publications) based on different network analysis algorithms (centrality, pagerank).
The connections extracted from the graph are:
Drug-drug interactions
Drug-disease indications
Drug-target associations
Target-disease associations
Targe-complex association
Drug-pathway annotations
Target-pathway annotations
Protein-publication mentions
Disease-publication mentions
These connections are extracted using these queries: report_manager/queries/knowledge_annotation.yml
and can be easily extended following the same query format.
Here, we show several examples of how to extract and visualize knowledge for a list of drugs.
[1]:
import pandas as pd
from ckg.report_manager import knowledge
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\outdated\utils.py:18: OutdatedPackageWarning:
The package pingouin is out of date. Your version is 0.3.11, the latest is 0.3.12.
Set the environment variable OUTDATED_IGNORE=1 to disable these warnings.
WGCNA functions will not work. Module Rpy2 not installed.
R functions will not work. Module Rpy2 not installed.
Annotation of Proteins Linked to a Specific Disease¶
We use the Open Targets platform https://www.targetvalidation.org/ to obtain lists of genes associated to Fibromyalgia. Open Targets compiled a list of 57 proteins targets that are associated to Fibromyalgia (https://www.targetvalidation.org/disease/EFO_0005687/associations?fcts=datatype:known_drug).
Fibrimyalgia is a medical condition characterized by chronic widespread pain and a heightened pain response to pressure. Other symptoms include tiredness to a degree that normal activities are affected, sleep problems and troubles with memory (source: https://en.wikipedia.org/wiki/Fibromyalgia).
We feed the list of proteins to CKG to prioritize all the knowledge gathered in the graph to reveal relationships to other possibly related diseases as well as possible treatments and altered biological processes and pathways.
[2]:
drug_list = ['Cefadroxil','Bronopol',
'Pyritinol','Menadione',
'Idarubicin','Proscillaridin',
'PF-04691502', 'Sulisobenzone',
'Tolnaftate', 'Uracil mustard',
'Racecadotril', 'Atracurium besylate',
'Galantamine', 'Sulfanitran',
'Hydroquinidine', 'Thiamine',
'Levofloxacin', 'Gefitinib']
Knowledge Object¶
To annotate the list of proteins, we create an empty object of type Knowledge.
Once we have the object, we can simply call the function annotate_list()
specifying the list of proteins and in this case the disease (or diseases) and what type of entities we want to annotate (Disease, Drug, Pathway, etc.).
[3]:
#Create Knowledge object
kn = knowledge.Knowledge(identifier='List_of_drugs', data=None)
[4]:
# Annotate the list of proteins using function annotate_list
kn.annotate_list(query_list=drug_list, # list of proteins
entity_type='drug', # type of items in the list
queries_file=None, # Allows YML file with customized queries or the default (None)
attribute='name', # What we provide in the list (name, id)
diseases=[], # List of diseases
entities=None) # what types of annotations (Disease, Drug, Pathway, etc.)
c:\users\sande\.conda\envs\pip_rev\lib\site-packages\pandas\core\frame.py:6692: FutureWarning:
Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.
To accept the future behavior, pass 'sort=False'.
To retain the current behavior and silence the warning, pass 'sort=True'.
This function runs all the queries in queries_file
(default: report_manager/queries/knowledge_annotation.yml
) associated to the entity_type
(protein) and limits the queried information to relationships to the list of proteins provided.
Summarization and Visualization¶
The graph contains millions of relationships and the results from the annotation may be too combersome.
In order to summarize the results and make them easier to understand and navigate, CKG uses network analysis algorithms (centrality (betweenness, closeness) and pagerank) to prioritize the nodes in the knowledge annotation graph.
The result summarizes the relationships of the top 15 nodes of each entity type according to these algorithms (Disease, Drug, Pathway, Biological_process, Complex, Publication).
The summarized results can be visualized either as a Sankey plot or as a network.
[5]:
kn.generate_report(visualizations=['network', 'sankey'], # how to visualize the results (network, sankey)
summarize=True, # Whether or not to summarize the annotation
method='betweenness', # Method for summarizing the annotation (betweenness, closeness, pagerank)
inplace=True, # If True, the summarized is saved, otherwise keep full graph
num_nodes=20) # Number of top nodes to be used in the visualization (default 15)
[6]:
kn.report.visualize_report(environment='notebook')[0]
All the Knowledge is Accessible¶
All the relationships extracted from the CKG are stored as a dataframe in the class property data
.
[7]:
kn.data.shape
[7]:
(10750, 7)
[8]:
kn.data.head()
[8]:
r.source | rel_type | source | source_type | target | target_type | weight | |
---|---|---|---|---|---|---|---|
0 | NaN | INTERACTS_WITH | Atracurium besylate | [Drug] | Galantamine | [Drug] | NaN |
1 | NaN | INTERACTS_WITH | Atracurium besylate | [Drug] | Proscillaridin | [Drug] | NaN |
2 | NaN | INTERACTS_WITH | Cefadroxil | [Drug] | Levofloxacin | [Drug] | NaN |
3 | NaN | INTERACTS_WITH | Galantamine | [Drug] | Atracurium besylate | [Drug] | NaN |
4 | NaN | INTERACTS_WITH | Galantamine | [Drug] | Gefitinib | [Drug] | NaN |
[9]:
kn.data.tail()
[9]:
r.source | rel_type | source | source_type | target | target_type | weight | |
---|---|---|---|---|---|---|---|
9293 | None | MENTIONED_IN_PUBLICATION | Uracil mustard | [Drug] | PMID:26048278 | [Publication] | NaN |
9294 | None | MENTIONED_IN_PUBLICATION | Uracil mustard | [Drug] | PMID:29596642 | [Publication] | NaN |
9295 | None | MENTIONED_IN_PUBLICATION | Uracil mustard | [Drug] | PMID:19001432 | [Publication] | NaN |
9296 | None | MENTIONED_IN_PUBLICATION | Uracil mustard | [Drug] | PMID:30845999 | [Publication] | NaN |
9297 | None | MENTIONED_IN_PUBLICATION | Uracil mustard | [Drug] | PMID:25383193 | [Publication] | NaN |
The generated knowledge subgraph can also be accessed as a NetworkX Directed graph.
[10]:
kn.graph
[10]:
<networkx.classes.digraph.DiGraph at 0x1e5d29cbd48>
And the report can be downloaded to a specified directory. The directory will contain the Sankey visualization in png
and svg
formats, the network in gml
and json
formats as well as the nodes and edges (relationships) tables in tsv
format.
[11]:
kn.report.download_report('tmp/List_of_drugs')
[ ]: